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A pattern matching approach is proposed for coding of two-level pictures. 
Patterns, which are either symbols such as characters, or fractions of black 
regions, such as line segments, are extracted from the facsimile. They are 
compared and matched to already transmitted patterns, called library patterns. 
If a correct match is detected, only the position of the pattern and the 
identification of the matching library pattern are transmitted. If a pattern 
does not match any library pattern, it is added to the library and its binary 
description is transmitted. Compared to conventional two-dimensional codes, 
the compression is often doubled and is sometimes 4.5 times higher. Compared 
to a symbol-matching coding technique, 2 the compression has increased by 20 
to 80 percent, depending upon the document. 

I. INTRODUCTION 

Conventional two-level picture coding techniques are based on the 
statistical dependence between neighboring picture elements (pels). 1 
The calculation of entropies, according to a local source model, gives 
the maximum achievable bit rates. Run length or predictive coding 
techniques or a combination of them takes advantage of the statistical 
dependence between neighboring pels and leads to bit rates close to 
the entropy. Each exploits what can be called the microscopic (pel) 
properties of a facsimile. 

Pattern-recognition coding techniques exploit macroscopic proper- 



Bell Laboratories. 



^Copyright 1983, American Telephone & Telegraph Company. Photo reproduction for 
noncommercial use is permitted without payment of royalty provided that each repro- 
duction is done without alteration and that the Journal reference and copyright notice 
are included on the first page. The title and abstract, but no other portions, of this 
paper may be copied or distributed royalty free by computer-based and other informa- 
tion-service systems without further permission. Permission to reproduce or republish 
any other portion of this paper must be obtained from the Editor. 



2513 



ties of the facsimiles. The image source is a source of patterns such as 
characters, lines, and black spaces. We can code the facsimile more 
efficiently, since the description is closer to the perceptual level. We 
can consider two kinds of pattern-recognition coding techniques. The 
first technique is pattern (or image) understanding. It recognizes a 
certain pattern, for example a letter, that possibly includes some font 
information. The second technique is pattern matching. Here, a pat- 
tern is not recognized, but is simply matched with already transmitted 
patterns, and if a correct match is detected, it is replaced by the 
matching pattern. It does not use the image-understanding level. The 
image-understanding approach has the potential advantage of a very 
high compression, but the often important aesthetic details of the 
documents can be lost, and there is a risk of errors at the present level 
of such techniques. The matching approach yields lower compression, 
but keeps more of the original pictorial information. There are also 
lower risks of errors, since matching allows only slight modifications 
in the pattern shapes. Naturally, neither of the pattern-recognition 
techniques is lossless, since they modify the picture content. 

Ascher and Nagy 3 and Pratt et al. 2 have already proposed facsimile 
coding techniques using matching techniques. In the system presented 
here, not only the symbols, as in Pratt's case, but also graphical 
elements such as line segments and black regions are matched. The 
patterns are efficiently coded and updated, leading to significantly 
higher compressions. 

II. SYSTEM DESCRIPTION 

Figure 1 shows the block diagram of the system. The pattern locator 
examines the facsimile line by line. When it locates a black pel, the 
pattern isolator picks up a pattern. The pattern is either a symbol 
(defined as a set of black pels completely surrounded by white pels) 
or, when no symbol can be extracted, a fraction of the black region. 
Therefore, contrary to Ref. 2, there is no residue to be coded, since all 
black pels belong to a pattern. 

The matcher makes a template matching of the incoming pattern, 
with existing library patterns to determine whether the incoming 
pattern is similar to an already transmitted pattern. The system 
screens the library patterns to reduce the time-consuming template 
matching. Thus, we consider only the patterns that might match the 
incoming pattern. We screen by comparing features of the library 
patterns with those of the incoming pattern. We apply a very efficient 
and simple two-pass screening. If a correct match is detected, the 
matcher sends the information about the position of the pattern and 
its library identification to the coder. If no match has occurred, the 
incoming pattern is added to the pattern library. The pattern library 
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is empty at the beginning of the coding and is gradually built up by 
the incoming library patterns. The matcher then also sends the infor- 
mation about the position and description of the new library patterns 
to the coder. 

A library update and management unit takes care of the addition 
and deletion of library patterns and organizes them for the quickest 
possible match and most efficient coding. All the patterns isolated 
along one line are stored in the coder. When the end of the line is 
reached, we sort the patterns, which allows a more efficient coding. 

III. LOCATION AND ISOLATION OF PATTERNS 

Patterns, in the present context, are the primitive elements of the 
coding process. They are isolated, and sent to the matching block 
sequentially, in a raster order. We distinguish two classes of patterns, 
relative to a square window of a predetermined size, W. 

1. A symbol is defined as a connected region consisting of black pels 
and completely surrounded by white pels, such that it can completely 
fit into the window. 

2. A nonsymbol is defined as a windowed portion of a black con- 
nected region that is larger than the window. 

Usually, characters and small graphics elements can be represented 
as symbols, while lines and larger figures can be decomposed into 
nonsymbols. The decomposed figures can be later reconstructed by 
taking the union of the nonsymbols. The nonsymbols do not have to 
be disjoint, and a better compression may sometimes result from a 
decomposition into overlapping symbols. 

Decomposing large figures into nonsymbols allows us to use match- 
ing techniques to compress graphical information, as well as text. A 
figure can be decomposed in many ways, and the compression that 
results from grouping similar nonsymbols usually depends on the 
decomposition. The final compression, or the number of different 
classes of nonsymbols, can be used as a measure of quality of the 
decomposition, and one may try to find the best decomposition in 
respect to such measures. Finding the optimal decomposition, however, 
may be computationally quite complex (we do not know of any related 
study) and it would certainly require many passes through a figure. At 
present, we use a one-pass isolation procedure, which allows us to 
keep the computation within reasonable bounds. 

The isolation procedure repeatedly isolates and removes the upper- 
left portion of a black region, up to a maximum size allowed by the 
window. If the isolated pattern has no black pel extensions, then it is 
a symbol; otherwise it is a nonsymbol. 

The isolation algorithm operates on a two-dimensional one-bit array 
containing the original picture. The picture memory is scanned line 
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by line from the upper-left element. When a black pel is found, the 
procedure attempts to trace the boundary of a black region, clockwise. 
The tracing algorithm is a standard one; however, we describe it here 
for further reference. Let us call the first black pel (xi,yi). The 
neighbors (adjacent pels in eight directions of (xi,yi) are being exam- 
ined, beginning at (xi+l,yi) and searching clockwise around (xi,yi) up 
to (xi— l,yi+l). If a black pel is found, it becomes the second pel of 
the contour — (x 2 ,y2); otherwise (xi,yi) is erased from the picture 
memory (single pels are neglected) and the scan continues. Each 
subsequent pel of the contour is found by searching around the current 
pel (x„y,), beginning two steps clockwise from the previous pel 
(x,-i,yi_i) (Fig. 2). The contour trace ends when it returns to the first 
pel in such a way that the next pel would by (x 2 ,y2)- The tracing 
algorithm checks for the limits of the picture array and it maintains a 
window. Pels beyond the limits of the picture array and those outside 
of the current window are always treated as white (0 valued). The 
purpose of the window is to restrict the maximal size of isolated 
pattern to W X W. The window is initially set to a size 2 W X W, and 
positioned in such a way that (xi,yi) is in the center of its upper edge. 
When the traced part of the boundary reaches a width of W, the 
window is reset to a size W X W, and it is placed over the boundary 
part that has been traced, such that (xiyi) is still at the upper edge of 
the window (Fig. 3). 

The tracing of the boundary is recorded in a two-dimensional one- 
bit array S in the following way. When the search around the current 
boundary pel (x,,y,) goes past the pel (x,-+l,y t -), a 1 is put in S(x,+l,y,). 
If the search goes past the element (x,— l,y,) then a 1 is put in S(x,,y,). 
All the elements of S are initially set to 0. The information in S (Fig. 
4), after the trace termination, completely represents the boundary (it 
is a form of run-length code). The pattern now can be isolated by 
copying and erasing the portion of the picture that is enclosed by the 
boundary (including the boundary). This is accomplished using the 
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Fig. 3 — Window positioning for isolation. 




O WHITE PELS # BLACK PELS 

(#) O PELS WHICH ARE SET TO 1 IN ARRAY S 

Fig. 4 — Contour encoding in array S. 



information in the array S. For any row of S, let Si, S 2 • • • S„ be the 
position (x-coordinates) of 1 -valued elements in a row. The number n 
is always even, which is a property of the boundary encoding that we 
use. For every row of S, the pixels of a corresponding row of the picture 
memory between Sj and S 2 , S 3 and S 4 , etc., are copied to another array, 
and set to in the picture memory, including Si,S 3 ■ • • and excluding 
S 2 ,S 4 , • • • . The pattern is now isolated and erased from the picture 
memory. While the isolation algorithm described above always works 
correctly, i.e., it isolates symbols and completely decomposes large 
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figures into nonsymbols, it does not attempt in any way to optimize 
the decomposition, so the results are not always pleasing. 

To improve the decomposition in cases commonly occurring in 
graphics, we have added two extensions to the basic isolation scheme: 

1. L-pattern suppression 

L-pattern suppression improves the segmentation of large blobs that 
otherwise may generate many dissimilar nonsymbols (Fig. 5). This 
extension is implemented in the tracing phase of the isolation algo- 
rithm as follows: If the beginning part of the traced boundary goes 
straight down from either first or second pel over more than k (cur- 
rently k = 10) pels, then an attempt to turn immediately to the right 
resets the lower edge of the window to the last pel before the right 
turn, so the boundary is forced to turn left (see Fig. 6). 

2. Cross decomposition 

If the isolated pattern can be represented as an intersection of a 
horizontal and a vertical line segment (a cross), then each segment 
becomes a separate pattern. This is implemented by comparing each 
isolated pattern (with the matching technique described in Section 4) 
to a cross formed by secting this pattern with vertical and horizontal 
lines one pel from the edges of the final window (Fig. 7). If a sufficiently 
close match is found, then one of the line segments from the cross is 





(a) 



(b) 



Fig. 5— Improvement in segmentation due to L-pattern suppression, (a) Before 
suppression, (b) After suppression. 
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Fig. 6— L-pattern suppression. When tracing reaches the corner, it is forced to follow 
the dashed line. 
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returned to the picture memory, while the other replaces the isolated 
pattern. This extension reduces the number of patterns generated by 
line crossings in grids and tables (Fig. 8). 

The basic isolation algorithm is similar to the region extraction 
method of Dudani 4 , but in contrast to the latter it does not need to 
store and process a list of boundary points, and it extracts regions 
containing holes in one pass. This algorithm can be shown to work 
correctly in every case and it is well suitable for a hardware imple- 
mentation. The extensions of the basic algorithm are heuristic in 
nature, but they improve considerably the decomposition of large 
regions. Examples of such improvements are shown in Fig. 5 and 8. 
Additional improvements may be possible at some increase of the 
computational cost. 

IV. MATCHING 

The matching includes all the processes necessary to know whether 
an incoming pattern matches any of the library patterns. In this 
system, we divide the matching into three parts. 

1. The screening unit makes a selection of the library patterns, and 




(a) (b) 

Fig. 7 — Forming an intersection pattern. 

T-L-h -+ "T _L + h- 

(a) 



(b) 

Fig. 8— Patterns resulting from grid segmentation, (a) Before cross decomposition, 
(b) After cross decomposition. 
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directs for template matching only those library patterns that might 
match. 

2. The template matcher creates a new binary picture called error 
picture, containing black pels or l's in the locations where the two 
template-matched patterns are dissimilar. 

3. The matching decision process uses the error pictures and other 
information to decide whether a correct match has occurred. 

4.1 Screening 

The purpose of the screening is to reduce the time-consuming task 
of the matcher. It should direct to the template matcher only the 
library patterns that might match the incoming (unknown) pattern. 
The screening is obtained by measuring some characteristics of the 
patterns, called features, and comparing them. The features must be 
easy to compute and compare, and also must form an easily classifiable 
space. The digitization of a facsimile adds much noise to a pattern. To 
get an efficient screening, the features must also be relatively noise 
independent. Four features were chosen for the screening. Two of 
them are obvious: the pattern length and the pattern height. The two 
others are the number of horizontal and the number of vertical white 
runs enclosed in the pattern. They are characteristics of the inside of 
a pattern, separating, for example, c from e or o. The chosen features 
are shown in Fig. 9. The straightforward feature "number of black 
pels" was found to be of little use because of its high variability and 
dependency upon the other features. 

The screening process also must decide in which order to send the 
library patterns to the matcher. The most probable match should be 
sent first, to reduce the number of matches. The probability of a match 
between patterns depends not only on the similarity of their features, 
but also on the probability of occurrence of a library pattern. For 
example, an incoming pattern having the same feature distance to an 
and a Q is much more likely to match the O than the Q since O is 
much more frequent than Q. The screening takes into account both 
the feature similarity and the probability of occurrence of a library 
pattern. We consider the probability of occurrence by sorting the 
library patterns according to the number of times they have matched 
(see Section 5.2.1). We take the feature distance into account by 
allowing for each feature only a fixed margin between the two patterns. 
The margin must be wide enough not to preclude any correct match 
and tight enough to reduce the number of template matches. A two- 
pass screening was found very efficient. In the first screening, only 
library patterns with features very similar to those of the incoming 
patterns are sent to the template matcher. A second, much looser, 
screening is applied only in the few cases where no match occurred. 
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HEIGHT = 15 




Fig. 9 — Features chosen for screening. Horizontal lines indicate which runs are 
included in count of horizontal runs. Vertical lines indicate which runs are included in 
count of vertical runs. Horizontal run count is six and vertical run count 15. 



The screening and the sorting are very efficient in reducing the number 
of matches. For example, for a typewritten document, the average 
number of matches per incoming pattern is reduced to 2.5, compared 
to 25 without screening and sorting. 

4.2 Template matching 

The template matcher creates a new picture called error picture, 
which contains l's in the locations where the two patterns are differ- 
ent. The error picture is obtained simply by superimposing the two 
patterns and making "exclusive or" of the corresponding pels. Figure 
10 is an example of matching two patterns of the same character, 
while Fig. 11 shows the matching of two unlike patterns. Two patterns 
are always matched nine times, allowing the displacement of one 
pattern compared to the other by ±1 in both the horizontal and 
vertical directions. 

4.3 Matching decision 

The matching decision unit must process the error picture to detect 
whether there is a correct match, and to decide which relative position 
of the library pattern gives the best match. 

The straightforward approach is to count the number of errors (or 
l's) in the error picture and to threshold it to make the decision. Such 
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(a) 



(b) 



(c) 



Fig. 10 — Template matching of two similar patterns, with (a) and (b) original patterns 
and (c) error picture. 






(a) 



(b) 



(c) 



Fig. 11 — Template matching of two different patterns, with (a) and (b) original 
pattern and (c) error picture. 

a technique would lead to many mismatches or many undetected 
matches, since, as shown in Ref. 2, the error count for two patterns 
corresponding to the same character is sometimes higher than the 
count for two patterns corresponding to different characters. This is 
caused by the digitization noise. Figure 10 shows that the template 
matching of two patterns of the same character gives relatively ran- 
domly distributed errors. Figure 11 shows that in the case of patterns 
of different characters, a cluster of errors appears where there are 
morphological differences between patterns. 

As Ref. 2 shows, we could apply a weighted error count where the 
weight of an error is equal to the number of error pels among its eight 
neighbors. Single errors are erased and the maximum weight is eight. 
Figure 12 gives the weighted error pictures from the error pictures of 
Figs. 10 and 11. The weighted error count is not sufficient for the 
matching decision, as shown by Fig. 13. We must look at local error 
patterns to make the decision. The reason is that it is the local 
characteristics of the pattern that indicate whether two patterns are 
the same. Therefore, any decision made upon a count or integration 
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Fig. 12— Weighted error pictures, (a) Weighted error count is 18 in Fig. 10 and (b) 
144 in Fig. 11. 
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Fig. 13 — A weighted error count matching criterion lead to a mismatch, with (a) and 
(b) original patterns, and (c) weighted error picture. 

may be incorrect. The matching decision described below uses only 
local measures and is also made locally with the simple rule that the 
match is considered correct if no local rejections are detected during a 
template matching. 
The following rule of decision is made. A match is rejected if: 
Condition 1: An error pel has a weight of 4 or more, or 
Condition 2: (a) an error pel has a weight of 2 or more, (b) at least 
two of its neighboring error pels are not connected, and (c) one of the 
two pels from the patterns used to obtain the error pel has a weight 
of or 8 (corresponding to or 8 surrounding black pels). 

Most mismatches are detected by Condition 1, but Condition 2 is 
necessary in order to reject, for example, the possible match of an e 
and a c shown in Fig. 13. It is easy to see that Condition 2a is not 
necessary since it is included in 2b, but Condition 2a reduces the 
computation. 

With these matching criterion, no visible mismatches have been 
detected, except slight distortion in line drawings. It is important to 
notice that a rejection can often be detected after processing a small 
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fraction of the error picture. A matching decision made at the same 
time as the template matching would lead to an early abortion of 
template matchings and thus reduce the computation. 

When a correct match is detected, several relative positions some- 
times give a correct match. The chosen relative position will be the 
one with the lowest error count. The best relative position will decide 
where the library pattern will be put to replace the incoming pattern. 

V. CODING 

Contrary to many conventional facsimile coding techniques, we 
must code several different kinds of events and design several separate 
code books. The code for a pattern includes the position and the 
description of the pattern. The description is usually its library iden- 
tification, or in the case of a new pattern, its complete description. 
The coding procedure is described here for the size of the International 
Telegraph and Telephone Consulative Committee (CCITT) test fac- 
similes having 1728 pels per line and 2376 lines, but it can easily be 
modified for other cases. 

5. 1 Coding of the position of the pattern 

To obtain a good-quality reproduction with pattern matching, we 
must position the patterns accurately. Considering the CCITT test 
documents, 23 bits are necessary for an absolute fixed length coding 
(11 bits horizontally, 12 bits vertically). We choose to transmit the 
horizontal position uncoded (11 bits) because variable-length run- 
length coding would lead only to slightly smaller coding length (typi- 
cally 1 to 1.5 less bits/pattern) since the horizontal distance between 
patterns is large. Also since the absolute horizontal position is coded, 
the patterns can be transmitted in a nonsequential order, which, as 
shown later, leads to a significant decrease in the average coding 
length for the library identification code words. It should be noted 
that with 1728 pels/line and an 11 -bit code word, the code words 
starting with 111 are not used and therefore can be used as special 
code words. 

We code the vertical position of the patterns in the following way: 

1. A mode bit is sent at the beginning of each line to indicate 
whether there are any patterns starting on that line. 

2. If there are no patterns on the line, operation 1 is repeated on 
next line. 

3. If there are patterns on a line, they are all coded. The special 
horizontal code word 111 indicates that there are no more patterns on 
the line and that the next line can be considered. 

4. When a pattern is replaced by a library pattern, the position of 
the library pattern might be moved up or down by one line. Therefore, 
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after the library identification has been coded, the code words 10 and 
11 are used to position the library pattern up or down, while code word 
is used to indicate no vertical displacement. No vertical displacement 
code word is sent with a new library pattern, since there are no changes 
in vertical position. 

Figure 14 shows examples of the message format for the pattern 
positioning. 

5.2 Coding of the pattern identification 

The coder must send a pattern identification word with each pattern. 
We can transmit the pattern number uncoded. It requires, for example, 
seven bits in the case of a library size of 128 and nine bits in the case 
of a library size of 512. The coding procedure used here will lead to an 
average coding length of the pattern identification of fewer than five 
bits/pattern. It will be obtained by a continuous library updating and 
by variable-length coding. 

5.2. 1 Library updating and management 

The library management and updating is done for the following 
purposes: 

1. Accept new library patterns, and if necessary, delete a seldom 
used library pattern to make room for the new one. 

2. Organize the library for the fastest match, taking into account 
the screening and matching procedures. 

3. Organize the library for minimum average library identification 
coding length. 

All three require the same processing: to keep track of the number 
of times each library pattern is used. By ordering the library pattern 
in order of decreasing usage, the correct match will be obtained rapidly, 
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Fig. 14 — Coding of positions of patterns. Two lines have no patterns, then a line has 
three patterns; the first on position 231 is replaced by a library pattern, the second on 
position 1532 is a new library pattern, There are no patterns on next line. 
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since the most used library patterns will be accessed first. An efficient 
coding of the patterns' identification is obtained by giving short code 
words to the first patterns in the list. The last pattern in the list, 
which is one of the least used patterns, can be deleted to make room 
for a new one. 

The updating must be deterministic and use no future information, 
since the receiver must make the same updating to decode correctly. 

The updating rule of the patterns in the library is as follows: 

1. When a pattern matches a library pattern number K, that library 
pattern is moved to number K/2 and all the pattern numbers from K/ 
2 to K-l are increased by 1. 

2. When a new pattern is added to the library, it gets number N/2 
where N is the total number of library patterns. The patterns with 
numbers from N/2 to N will be increased by 1, and if N is equal to 
the maximum number of library patterns M, the library pattern with 
number N + 1 is dropped. 

This updating procedure was found to efficiently give low identifi- 
cation numbers to often used patterns and high numbers to seldom 
used patterns. If M is the maximum number of library patterns, it 
guarantees that a new library pattern will stay in library for at least 
M/2 matches, but generally for many more. 

5.2.2 Pattern identification coding table 

The pattern identification coding table includes two special code 
words: "new pattern" and "same pattern." They are added to increase 
the coding efficiency. The "new pattern" code word is chosen because 
it is not necessary for a new library pattern to send an identification 
number, since the decoder uses the same rule as the coder to assign 
the identification number to the new pattern. The "same pattern" 
code word indicates that the transmitted pattern is the same as the 
previously transmitted pattern. It is useful particularly for typewritten 
text where the line-by-line search for a pattern often detects the same 
pattern (character). 

The coding table for the pattern identification is given in Table I 
for a pattern library with a maximum of 512 patterns. 

This code leads to an average library identification length of fewer 
than seven, compared to nine with a fixed-length code. The next 
section shows a more efficient coding procedure. 

5.2.3 Pattern identification coding by sorting 

Since an absolute code gives the horizontal position of a pattern, it 
is possible to transmit the patterns detected along a line in any order. 
The only condition is that the library updating be done at the end of 
the line. The average coding length of the library identification is 
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Table I — Coding table for identification of library patterns 







Code Word 


Symbol 


Code Word 


Length 


Same pattern 


000 


3 


Library pattern 1-16 


1XXXX 


5 


New pattern 


00100 


5 


Library pattern 17-32 


010XXXX 


7 


Library pattern 33-64 


0011XXXXX 


9 


Library pattern 65-128 


00101XXXXXX 


11 


Library pattern 129-512 


011XXXXXXXXX 


12 



reduced to fewer than five bits, by sorting the patterns on a line 
according to their library number. That is because: 

1. Many of the patterns are the same. 

2. The library pattern identification number is run-length coded 
(only the increase compared to the previous identification number is 
coded). 

3. The new library patterns are sent at the end of the line; therefore, 
the new pattern code word is sent only once, since any more patterns 
are automatically new patterns. 

This can be illustrated by an example. Let a line have the following 
pattern: pattern 23, new pattern; pattern 28, same; pattern 23, new 
pattern. By looking at Table I, the coding length is 7 + 5 + 7 + 3 + 7 
+ 5 = 34 bits. With sorting, the patterns become: pattern 23, same; 
pattern 28, same; new pattern; new pattern. The coding length is 7 + 
3 + 5 + 3 + 5 + = 23 bits. It should be noted in this example that 
pattern 28 is coded as pattern 5 since only the increase in identification 
number compared to the previous pattern is coded. 

The library updating is done at the end of each line. This creates 
problems when accepting new library patterns. They must be added 
immediately to the top of the library, since the position of the other 
patterns should not be changed. It is also not possible to delete patterns 
to make room for the new ones. For that reason, before scanning a 
line, enough library patterns should be deleted to avoid an overflow of 
the pattern library. 

5.3 Coding of the library pattern description 

The size of a pattern is limited to 32 x 32 bits. The description 
starts with a 5 -bit word, which indicates the height, H, of a pattern in 
binary. The length of a pattern is extended to 32 pels by filling the 
right end with 0's. Therefore, there are 32 X H pels to code. For coding 
efficiency, one white pel (0) is added at the beginning. A coding line 
is made of the 32 X H + 1 pels considered in the raster scan order. 
The reference line is similar to the coded line except that all the pels 
are shifted to the right by 32 pels (one line). Therefore, a line is coded 
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using the previous line as the reference. The line is then coded by the 
CCITT two-dimensional code, 5 with the only modification that the 
first code word, which is always the horizontal mode code word, is 
deleted, since it doesn't give any information. For coding efficiency, it 
is chosen to allow switching between two modes for the coding of the 
library pattern description. The first mode is as described above and 
called "horizontal coding." The other is called "vertical coding" and is 
the same as above except that the pattern is coded column after 
column from top to bottom. Therefore, in the vertical mode the 
description starts with a 5-bit word indicating the length of a pattern. 
A header bit indicates which mode is chosen, with a for horizontal 
mode and a 1 for vertical mode. We could also code the pattern 
description using a code better matched to the source. This would 
reduce the coding length, but at the expense of requiring a specific 
code in place of a standard code. 

5.4 Coding summary 

The coding procedure can be summarized in the following way: 

1. All the patterns isolated along a scan line are matched. 

2. At the end of the line, the matched patterns are sorted in order 
of increasing pattern identification number. The new library patterns 
are added at the end in sequential order. 

3. The patterns are coded and transmitted with the information 
sent in the following order: 

a. Horizontal position of pattern. 

b. Pattern identification. If it is a new pattern, the identification 
is sent only for the first new pattern on the line. 

c. A 1- or 2-bit code word to specify the vertical shift of a pattern, 
except if it is a new library pattern. 

d. For a new library pattern the following bits are sent: (1) a header 
bit indicating whether the horizontal or vertical coding mode is 
chosen, (2) a 5-bit word indicating the number of lines of the 
pattern to be coded, and (3) the CCITT two-dimensional coding 
of the pattern (see 5.2). 

e. After all patterns on a line have been sent, the special horizontal 
code word 111 indicates the end of the line. 

f. The library update is made according to 5.2.3. The patterns are 
updated in order to increasing identification number. After 
updating, all patterns with a number greater than 480 are 
deleted, thus allowing for at least 32 new library patterns to be 
added on the next line. 

Figure 15 is an example of message transmission. The different code 
words are summarized in Table II. 
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Table II — Description of the code words for pattern matching coding 



Code Definition 



Word Size 



Description 



Mode bit 


1 


Horizontal position 


11 


No more pattern 


3 



Vertical move of pattern 1 or 2 



Library identification code Variable 

Library pattern descrip- 1 

tion header 
Library pattern size 5 

Library pattern descrip- Variable 

tion 



Indicates whether there are any patterns on 
the line. 

Gives in binary the absolute position of a 
pattern. 

Indicates that there are no more patterns 
on the line (this code word: 111 is a special 
horizontal position code word). 

Indicates whether the pattern must be 
moved up or down by one line or is not 
moved. 

Defines which library pattern is transmit- 
ted. 

Indicates whether the library pattern is 
coded in horizontal or vertical mode. 

Slightly modified CCITT two-dimensional 
code. 



VI. SIMULATION RESULTS 

The important criteria are the compression and the quality of the 
received documents. For that purpose, the set of eight CCITT facsimile 
documents are used. Their resolution is 7.7 pels/mm (200 pels/in.) in 
both the horizontal and vertical directions. They have 1728 pels/line 
and 2876 lines. Documents one, two, four and five are shown in Fig. 
16. All eight documents are shown in Ref. 5. For accurate comparison 
with the matching technique by Pratt et al., 2 the simulations were also 
made with an older nonofficial version of the CCITT documents, 
which is similar except each document has 1728 pels/line and 2128 
lines. 

6. 7 Facsimile quality 

In order to improve the quality of the decoded picture, a local 
filtering using a 3 X 3 window is applied. In addition, large library 
patterns are slightly expanded on their borders. This operation erases 
artifacts in large black regions. 

The encoding scheme modifies the binary picture. We must there- 
fore verify that the alterations are not visible or at least not annoying. 
We can consider three picture alterations: wrong matches, matches 
with a slightly distorted pattern, and wrong positioning. In the case of 
a wrong match, a pattern is replaced by a different pattern. The only 
detected wrong matches are such as between and 0, dot and comma, 
I and 1, which even people cannot recognize correctly without using 
the context. Therefore, it can be considered that the system has 
practically no wrong matches. A match with a slightly distorted pattern 
can occur with characters. A character might match a same character 
of a different font. Or a character might match a same but thinned or 
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Our Ref. 350/PJC/EAC 18th January, 1972. 



Dr. P.N. Cundall, 
Mining Survey* Ltd. 
Holroyd Road, 
Reading, 
Berks. 



Dear Pete, 

Permit me to introduce you to the facility of facsimile 
transmission. 

In facsimile a photocell is caused to perform a raster scan over 
the subject copy. The variations of print denr.ity on the document 
cause the photocell to generate an analogous electrical video signal. 
This signal is used to modulate a carrier, which is transmitted to a 
remote deatination over a radio or cable communications link. 

At the remote terminal, demodulation reconstructs the video 
signal, which ia used to modulate the density of print produced by a 
printing device. This device is scanning in a raster scan synchronised 
with that at the transmitting terminal. As a lesult, a facsimile 
copy of the subject document is produced. 

Probably you have uses for this facility in your organisation. 

Yours sincerely, 



m. 



P.J. CROSS 

Group Leader - Facsimile Research 



(a) 
Fig. 16(a)— Original CCITT document one (first 2000 lines). 

thickened character. Such matches, contrary to wrong matches, are 
tolerable if they don't appear too often. Such distorted matches appear 
when two slightly different fonts are used on a same page or when 
characters of a page come from a low-quality typewriter or scanner. 
The wrong positioning of a pattern decreases the quality of the received 
facsimile. No noticeable wrong positioning for patterns such as char- 
acters or other symbols is observed. Some visible wrong positionings 
are observed for nonsymbol patterns such as line segments, where the 
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(b) 
Fig. 16(b)— Original CCITT document two (first 2000 lines). 

successive patterns make the lines slightly jagged. Figure 17 shows the 
same CCITT facsimiles as Fig. 16, but after transmission by pattern 
matching. It can be seen that there are no significant degradations. 
There are some slight irregularities in line drawings, as for example 
in Fig. 17d. A few distorted matches appear on CCITT document one 
(Fig. 17a). 

6.2 Compression 

To make an accurate comparison with both the symbol matching 
and two-dimensional coding techniques, the coding simulations have 
been made with both the official set of CCITT facsimile documents 
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L'ordrede lancement et de realisation des applications fait l'objet de dOclalunu au plus haut 
niveau de la Direction Generate des Telecommunications, n n'est certes pas question de 
construlre ce systeme integre "en bloc" mais bien au contraire de proceder par etapes, par 
palters successes. Certain* s applications, dont la rcntabilite ne pourra etre assuree, ne 
seront pas entreprises. Actnellement. sur trente applications qui ont pu etre globalement 
definies, six en sont au stade de l'exploltation. six autre s se sont vu dormer la priorlte pour 
leur realisation. 

Chaque application est ronfiee a un "chef de projet , responsable successivement dc sa 
conception, de son analyae-programn.ation et de sn miue en oeuvre dans une region-pilote. 
La generalisation ulterieure de l'application realisee dans cette region-pilote depend des 
resultats obtenus et fait l'objet d'unc decision de la Direction Generate. Neanmolns. le 
chef de projet dolt des le depart considercr que son activity a une vocation natlonale done 
refuser tout partlcularisme regional. 11 est aide d'une equipe d'analystea programmeurs 
et entoure d'un "groupe de conception" charge de rediger le document de "definition des 
objectlfs globaux" puia le "cahier des charges" de l'application, qui sont adresses pour avis 
a tous les services utllisateurs potentiels et aux chefs de projet des autres applications. 
Le groupe de conception comprend 6 a 10 personnes represents.. t les services les plus 
divers concernes par le projet,et comporte obligatoirement un bon analysts attache a ^ap- 
plication. 

II - L'IMPLANTATION GEOGRAPIIIQIJE D'UN RESEA1I INFORMATfQUE PERFORMANT 

L'organisatlon de l'entreprise francalse des telecommunications repose sur l'existence de 
20 regions. Des calculateurs ont ete implantes dans le passe au moine dans toutes les plus 
importantes. Ontrouve ainsi des machines Bull Gamma 30 a Lyon et Marseille, des GE 425 
a Lille, Bordeaux, Toulouse et Montpellier, un GE 437 a Massy, enfin quelquea machines 
Bull 300 TI a programmes cables etaient recemment ou sont encore en service dans les 
regions de Nancy, Nantes, Limoges, Poitiers et Rouen ; ce pare est cssentlellement utilise 
pour la comptabilite teiephonlque. 

Al'avenir, si la plupart des ficblers necessaires aux applications decrites plus haut peuvent 
etre geres en temps dlffere, un certain nombre d'entre eux devront necessairement etre ac- 
cessibles, voire mis a Jour en temps reel : parmi ces dt-rnlers le fichier commercial des 
abonnes. le fichier des reireeignements, le fichier des circuits, le fichier technique des 
abonnes c ont lend r ont des qitantites considerables d'informationa. 

Le volume total de caractftres a g6rer en phase finale sur un ordlnateur ayant en charge 
quelques 500 000 abonnes a ete eatime a un milliard de caractferes au moins. Au moins le 
tiers des donnees seront concernfies par des traitements en temps reel. 
Aucun des calculateurs enumerfis plus haut ne permettait d'envisager de tels traitements. 
L'integration progressive de toutes les applications suppose la creation d'un support commun 
pour toutes les informations, une veritable "Banque de donnees", repartie sur des moyens 
detraitement nationaux et regionaux, et qui devra rester alimentee. mlse a jour en perma- 
nence, a partir de la base de l'entreprise, e'est-a-dire les chantiers, les magasins, les 
guichets des services d'abonnement, les services de personnel etc. 

L'etude des different* fichiers a consumer a done permis de definir les principales carac- 
teristiques du reseau d'ordlnateurs nouveaux a mettrc en place pour aborder la realisation 
du syst*me informatlf . L'obligationdefaheappel a des ordinateurs de troisieme generation, 
trfes puissantset dotes devolumlneuses mCmoires de masse, a conduit a en reduire subs tan - 
tiellement le nombre. 

L'implantation de sept centres de calcul lnterr6glonaux constituera un compromis entre : 
d'une part le desir de reduire le coOt economique de l'ensemble, de faciliter la coordination 
des equipe 8 d'lnformaticlens; et d'autre part le refus de crecr des centres trop importants 
difficlles a gerer et a diriger.et posant des probtemes dclicats de securite. \x regroupe- 
ment des traitements relatifs a plusieurs regions sur chacun de ces sept centres permettra 
c!e leur donner une taille relativement homogene. Chaque centre "gtrera" environ un mil- 



(C) 
Fig. 16(c)— Original CCITT document four (first 2000 lines). 

and a former nonofficial version often used for facsimile compression 
comparisons. Table III gives the coding lengths for the CCITT docu- 
ments for the official and nonofficial set of CCITT documents, re- 
spectively. They include the code length for the different codes nec- 
essary for the pattern matching coding. Table IV gives the compression 
ratio for the same CCITT documents and compares them with the 
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Et cette phase est bien Poppoa* de IjKf), 

a un ddphasagc constant pres (tans importance) 

at I un retard T Q pre* (inevitable). 

Un signal utile S(i) traveriant un tel filue adapti 
donne » la sortie (a un retard 7* pres et a un depha- 
sage pres de la porteuie) un signal dont la transformer 
de Fourier est reefle, constante entre /„ et / +A/, 
et nulle de part et d'autre de/ et de f, + Af. c'eat- 
a-dire un signal de frequence porteus* / +A/72 et 
dont I'envdoppe a la forme indiquec a la figure 5, 
ob Ton a represent* simultanement le signal s(<) 
et le signs! 5|(() correspondant obtenu a la sortie 
du flltre adapt*. On comprend le nom de recepteur 
a compression d'impulsion donne a ce genre de 
ffltre adapte : la a largeur • (a 3 dB) du signal com- 
print etant egale a l/Af, le rapport de compression 

cat de _I~ - T&f 

l/Af 




On aaiait physiquemenl le phenomene de com- 
pression en realiunt que loi&que le signal S(i) entre 
dans la ligne a retard (LAR) la frequence qui entre 
la premiere a I'instant est la frequence baste /„. 



(d) 
Fig. 16(d)— Original CCITT document five (first 2000 lines). 

symbol- matching technique of Pratt et al. 2 and the two-dimensional 
CCITT code. The results are without any synchronization or stuffing 
bits, which is natural since pattern matching coding would be intended 
for future facsimile networks such as group four facsimile machines 
with fewer overhead bits. Therefore, the compressions of the two- 
dimensional CCITT code and symbol matching have been corrected 
by deleting the synchronization and stuffing bits and are different 
from their values given in Refs. 2 and 5. 
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Our Ref. 350/PJC/EAC 18th January, 1972. 



Dr. P.N. Cundall, 
Mining Survey* Ltd. 
Balroyd Road, 
Reading, 
Berks. 



Dear Pete, 

Permit bm to introduce you to the facility of facsimile 
transmission. 

In facsimile a photocell ii caused to perform a raster scan over 
the subject copy. The variations of print density on the document 
cause the photocell to generate an analogous electrical video signal. 
This signal is used to modulate a carrier, which is transmitted to a 
remote destination over a radio or cable communications link. 

At the remote terminal, demodulation reconstructs the video 
signal, which is used to modulate the density of print produced by a 
printing device. This device is scanning in a raster scan synchronised 
with that at the transmitting terminal. As a result, a facsimile 
copy of the subject document is produced. 

Probably you have uses for this facility in your organisation. 

Yours sincerely, 



m. 



P.J. CROSS 

Group Leader - Facsimile Research 



(a) 
Fig. 17(a) — Document on (first 2000 lines) after pattern matching (first 2000 lines). 

Very high compressions are obtained — up to 80. The compression 
has often doubled compared to that of the two-dimensional CCITT 
code and is sometimes 4.8 times higher. The compression is, depending 
upon the documents, 20 to 80 percent higher than the compression 
derived from the symbol matching technique by Pratt et al. 2 More 
detailed comparisons and observations are useful when considering 
the performances of facsimile coding by pattern matching: 

1. An astonishing fact is the difference in compression observed 
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(b) 
Fig. 17(b)— Document two (first 2000 lines) after pattern matching. 

between the old and the official version of the CCITT documents. For 
documents three and five the compressions are nearly twice as high 
for the old version than for the official version. Significant discrep- 
ancies are also observed for documents one and eight. This is in spite 
of the fact that old and official documents are the same except that 
they were scanned differently. It can also be noted that for the two- 
dimensional CCITT code, the difference in compression is smaller 
than five percent except for document eight, where the difference is 
about 20 percent. It must therefore be concluded that the performances 
of the pattern matching coding techniques are much more dependent 
upon the scanning and binary thresholding. Observing both versions 
of documents three and five, the main difference is that in the official 
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L'ordrede lancement et de reaUsatlon des appUcatlons fait l'objat de decisions au plus haut 
niveau de la Direction Generate dee Telecommunications, n n'est cartes pas question de 
construlre ce systeme lntegre "en bloc" mals blen au contralre de proc«der par etapes, par 
palters successes. Certalnes appUcatlons. dont la rentablllte ne pourra etre assures, ne 
aeront pas entreprlses. Actuellement, sur trente appUcatlons qui ont pu etre globalement 
definles, six en sont au stade de Sexploitation, six autres se sont vu dormer la prlorlte pour 
leur realisation. 

Cheque appUcatlon est confiee a un "chef de projet . responsable successlvement de sa 
conception, de son analyse-programmatton et de sa mlse en oeuvre dans une region- pilote. 
La generalisation ulterieure de l'appllcatlon reaUsee dans cette reglon-pUote depend des 
r«sultats obtenus et fait l'objet d'une decision de la Direction Generaie. Neanmoins. le 
chef de projet doit des le depart eonslderer que son activity a une vocation nationale done 
refuser tout partlcularlsme regional. U est aide d'une equipe d'analystes-programmeurs 
et entoure d-un "groupe de conception" charg* de r«dlger le document de "definition des 
obJectlfsglobaux"puls le "cahler des charges" de l'appllcatlon, qui sont adresses pour avis 
4 toue lee services utUlsateurs potentials et aux chefs de projet dee autres applications. 
Le groupe de conception comprend 6 a 10 personnee repr«sentant lee eervlcee lee plus 
divers concerned par le project comporte obUgatolrement un bon analyste attache a l'ap- 
pUcatlon. 

II - L'IMPLANTATION GEOGRAPHIQUE D'UN RESEAU INFORMATIQUE PERFORMANT 

L'organltatlon de l'entreprlse fran9al.se des telecommunications repose sur l'exlstence de 
20 regions. Des calculateurs ont et« lmplantes dans le passe au molns dans toutes les plus 
lmportantes. Ontrouve alnsi des machines BuU Gamma 30 a Lyon et Marseille, des GE 425 
a Lille. Bordeaux. Toulouee et MontpeUier, un GE 437 a Massy, enfln quelques machines 
BuU 300 TI a programmes cables «talent recemment ou sont encore en service dans les 
regions de Nancy, Nantes. Limoges, Poitiers et Rouen j ce pare est essentieUement utilise 
pour la comptabUlte ttlephonique. 

Al'avenlr, ilia plupart dee fichiers necessaires aux appUcatlons decrltes plus haut peuvent 
etre geres en temps differ «, un certain nombre d'entre eux devront necessairement etre ac- 
cessiblee, voire mie a Jour en temps reel : parml ces dernlers le fichler commercial des 
abonnes, le flchier dee reirselgnemente, le fichler des circuits, le fichler technique des 
abonnCs contlendront des quantites considerables d'informations. 

Le volume total de caractei'es a gerer en phase finale sur un ordlnateur ayant en charge 
quelques 500 000 abonnes a ete estlme a un milliard de caracteres au molns. Au moins le 
tiers des donnees seront concernees par des traltements en temps rM, 
Aucun des calculateurs enumeres plus haut ne permettait d'envisager de tels traltements. 
L'integratlon progressive de toutes les applications suppose la creation d'un support commun 
pour toutes les informations, une veritable "Banque de donnees", repartle sur des moyens 
detraltement nationaux et regionaux, et qui devra reeter allmentec. mlse k Jour en perma- 
nence, a partlr de la base de l'entreprlse, c'est-a-dlre les chantlers, les magaslns, les 
guichets des services d'abonnement, les services de personnel etc. 

L' etude des dlfferents fichiers a constltuer a done per mis de definlr les princlpales carac- 
terlstlques du r«seau d'ordlnateurs nouveaux a mettre en place pour aborder la realisation 
du systeme informaUf. L'obUgatlon de faire appel a des ordinateurs de trolsleme generation, 
tres pulssants et dotes devolumlneuses memolres de masse, a conduit a en require substan- 
tieUement le nombre. 

L'lmplantatlon de sept centres de calcul interreglonaux constltuera un compromis entre : 
d'une part le d«slr de rCduire le coot economlque de 1'ensemble. de facUiter la coordination 
des equlpes d'lnformatlclens; et d'autre part le refus de creer des centres trop Importants 
difficUes a gerer et a dlriger.et posant des problemes delicate de securlte. Le regroupe- 
ment des traltements relatifs a plusleurs regions sur chacun de ces sept centres permettra 
de leur donner une taUle relatlvement homogene. Cheque centre "gerera" environ un mil- 



(C) 

Fig. 17(c)— Document four (first 2000 lines) after pattern matching. 

version, characters are often clustered together, which leads to incor- 
rectly (or rather "nonconveniently") isolated characters (as shown in 
Fig. 18a), while for the old version, the characters are rarely clustered 
together. However, sometimes a character in the old version is isolated 
into several patterns because not all its pels are connected (as shown 
in Fig. 18b). The old version of documents three and five should have 
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Un signal utile S(i) traversant un tel nitre adapte 
donne a U sortie (a un retard 7, pre* et a un depha- 
sage pre* de la porteuao) un signal dont la transformcc 
de Fourier ett reelle, constante entre /"„ et / +A/. 
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dont I'envdoppe a la forme Indiqute a la figure S, 
ob Ton a represent* slmukanement le signal S(t) 
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Fig. 17(d) — Document five (first 2000 lines) after pattern matching. 

more patterns than the official but fewer library patterns. In fact, old 
CCITT document three has 2199 patterns and 225 library patterns, 
while the official version has 1945 patterns and 551 library patterns. 
The coding length of a library pattern is much greater (by a factor of 
about 10) than that of a nonlibrary pattern, which explains the 
difference in compression ratios. It can be concluded that pattern 
matching is much more dependent on the scanning quality and the 
thresholding than two-dimensional facsimile codes. 
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Table IV — Comparison of compression ratios 



(a) Official CCITT documents (2376 x 1728 pels) 
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(b) Nonofficial CCITT document (2128 x 1728 pels) 
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Fig. 18 — Characters isolated in an unwanted way. (a) Characters clustered together, 
(b) Text containing characters isolated into several symbols. 

2. The increase in compression ratio compared to the two-dimen- 
sional run-length code is quite variable. For documents containing 
mostly handwritten drawings and text, such as documents two and 
eight, there is sometimes a slight decrease in the compression ratio. 
That is because there are few matching patterns. For example, for 
document two, there are 736 patterns, but 448 of them are library 
patterns. For documents containing mostly text, such as document 
four, the compression ratio increases by a factor of about 4.5. For 
documents containing a mixture of text and drawings, the increase 
varies between 35 and 220 percent, depending on the content and 



PICTURE CODING 2541 



thresholding. It should be noted that for document seven, which 
contains printed ideograms, the increase in compression ratio is 
smaller than for regular printed text because there are more ideograms 
than letters, but the compression ratio still doubled. 

3. The increase in compression ratio by pattern matching is 20 to 
80 percent compared to the symbol matching of Ref. 2. The increase 
has been obtained by a combination of several factors. The most 
important are (a) isolation of nonsymbols (lead to significant improve- 
ment for documents three, four, five, and six, but has a slight negative 
effect on documents two and eight), (b) better matching, leading to 
fewer library patterns, and (c) improved coding efficiency obtained by 
sorting the patterns and by other coding modifications. 

By looking at the coding length necessary for the different kinds of 
code words, in Table III, it is clear that the predominant part of the 
code is used for the description of the library patterns, accounting 
generally for more than 60 percent of the total coding length. There- 
fore, improving it can bring the highest reward. The improvement can 
be obtained by reducing the number of library patterns or by coding 
the pattern description more efficiently. The next most bit-consuming 
part is the coding of the horizontal position; it uses about 20 percent 
of the total coding length. 

6.3 Complexity 

The pattern-matching coding has the disadvantage of being complex 
and time-consuming— the price to pay for an efficient coding. The 
most time-consuming parts are: the isolation, the template matching, 
and the matching decision. The isolation is both complex and time- 
consuming, and therefore the most difficult part, but by using fast 
logic, it is possible to isolate all the patterns in about one second. The 
template matching is a simple operation, but it takes a long time. It is 
therefore less of a challenge, since it is easily done in parallel and with 
simple hardware. The most time-consuming part of the matching 
decision uses local operators on, for example, 3x3 windows and can 
therefore also be realized without much complication. Most of the 
high-level operations are much slower and can be done by micropro- 
cessors. This system should not be more complicated than in Ref. 2. 

An important factor is that the decoding is much easier and faster 
than the coding, since there is no isolation or matching. Such a 
technique is therefore particularly suited for transmission with one 
sender and several receivers. 

An experimental pattern matcher has been built to show that the 
same kind of compression can be obtained when scanning real docu- 
ments. By using a mixture of custom logic and programmed logic, 
transmission has reached speeds at rates up to 64 kb/s. A document 
is then usually sent in one to two seconds. 
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VII. EXTENSIONS 

Several improvements or different applications of a pattern matcher 
can be considered. Some of them will be described here. 

7. 1 Multipage document and prestored libraries 

When transmitting several pages of a document, the library from 
one page can be used for next page, thus reducing the number of 
library patterns for each page. In such cases, compressions up to eight 
times higher than with conventional coding techniques can be 
achieved. If a few fonts are prestored in the coder and decoder, the 
compression can be increased significantly. 

7.2 Very high-quality transmission 

It is possible to use a tighter matching algorithm when even slight 
distortions are not tolerated. Such a mode can easily be implemented. 
It reduces the compression by an average of 15 percent. In that case, 
most of the postprocessing can be deleted. 

7.3 Standardization 

The CCITT is looking into standardizing facsimile coding tech- 
niques for future facsimile machines communicating over digital links 
(Group four facsimile apparatus). The modified READ code (also 
called two-dimensional CCITT code) has been standardized. The 
pattern matching coding technique has been proposed by AT&T to 
the CCITT as an optional coding technique yielding much higher 
compression. The only difference in the proposal compared to this 
paper is that no cross decomposition is applied. The compression is 
therefore slightly lower. 

7.4 High-resolution graphics 

Future scanners and coders will probably include resolutions higher 
than 200 pels/in. They will probably use 300 and 400 pels/in. The 
pattern matching technique can easily be modified for such resolution. 
The maximum size of the patterns should be increased to keep the 
coding efficient. In addition, the codes for the positioning of patterns 
must be slightly changed. The matching algorithm would stay un- 
changed. Compared to conventional techniques, the improvement in 
the compression will be as high and often even higher at such resolu- 
tions. 

VIII. CONCLUSION 

A system for coding of facsimiles using pattern matching has been 
described. It allows an important increase in the compression ratio 
compared with a symbol matching system 2 and gives a compression 
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ratio that is up to 4.8 times that of conventional facsimile coding 
techniques. The improvement is naturally greater for printed text than 
for handwritten text. It is felt that further significant inprovements 
are possible by better matching and coding. An important observation 
is that pattern matching coding is very dependent on the digitization 
and thresholding. Therefore, the combination of the thresholding and 
the isolation could lead to significant improvements in compressions. 
Another consequence is that if a bad quality scanner is used, the 
pattern matching will hardly lead to higher compressions than con- 
ventional facsimile codes. With modern electronics components, a 
pattern matcher can be realized by hardware and would lead to an 
important reduction in the transmission costs of high-volume facsim- 
iles. 
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